Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

prithivMLmods 
posted an update 1 day ago
view post
Post
2129
OpenAI, Google, Hugging Face, and Anthropic have released guides and courses on building agents, prompting techniques, scaling AI use cases, and more. Below are 10+ minimalistic guides and courses that may help you in your progress. πŸ“–

β€· Agents Companion : https://www.kaggle.com/whitepaper-agent-companion
β€· Building Effective Agents : https://www.anthropic.com/engineering/building-effective-agents
β€· Guide to building agents by OpenAI : https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
β€· Prompt engineering by Google : https://www.kaggle.com/whitepaper-prompt-engineering
β€· Google: 601 real-world gen AI use cases : https://cloud.google.com/transform/101-real-world-generative-ai-use-cases-from-industry-leaders
β€· Prompt engineering by IBM : https://www.ibm.com/think/topics/prompt-engineering-guide
β€· Prompt Engineering by Anthropic : https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
β€· Scaling AI use cases : https://cdn.openai.com/business-guides-and-resources/identifying-and-scaling-ai-use-cases.pdf
β€· Prompting Guide 101 : https://services.google.com/fh/files/misc/gemini-for-google-workspace-prompting-guide-101.pdf
β€· AI in the Enterprise by OpenAI : https://cdn.openai.com/business-guides-and-resources/ai-in-the-enterprise.pdf

by HFπŸ€— :
β€· AI Agents Course by Huggingface : https://huggingface.co/learn/agents-course/unit0/introduction
β€· Smol-agents Docs : https://huggingface.co/docs/smolagents/en/tutorials/building_good_agents
β€· MCP Course by Huggingface : https://huggingface.co/learn/mcp-course/unit0/introduction
β€· Other Course (LLM, Computer Vision, Deep RL, Audio, Diffusion, Cookbooks, etc..) : https://huggingface.co/learn
  • 2 replies
Β·
clem 
posted an update 3 days ago
view post
Post
3917
Today, we're unveiling two new open-source AI robots! HopeJR for $3,000 & Reachy Mini for $300 πŸ€–πŸ€–πŸ€–

Let's go open-source AI robotics!
Β·
ginipick 
posted an update 1 day ago
view post
Post
2553
🎨 AI Hairstyle Changer - Transform with 93 Styles! πŸ’‡β€β™€οΈβœ¨

πŸš€ Introduction
Experience 93 different hairstyles and 29 hair colors in real-time with your uploaded photo!
Transform your look instantly with this AI-powered Gradio web app.


✨ Key Features

πŸ“Έ Simple 3 Steps
Upload Photo - Upload a front-facing photo
Select Style - Choose from 93 hairstyles
Pick Color - Click your desired color from 29 color palette options


πŸ’« Diverse Hairstyles (93 types)

🎯 Short Cuts: Pixie Cut, Bob, Lob, Crew Cut, Undercut
🌊 Waves: Soft Waves, Hollywood Waves, Finger Waves
πŸŽ€ Braids: French Braid, Box Braids, Fishtail Braid, Cornrows
πŸ‘‘ Updos: Chignon, Messy Bun, Top Knot, French Twist
🌈 Special Styles: Space Buns, Dreadlocks, Mohawk, Beehive

🎨 Hair Color Palette (29 colors)

🀎 Natural Colors: Black, Browns, Blonde variations
❀️ Red Tones: Red, Auburn, Copper, Burgundy
πŸ’œ Fashion Colors: Blue, Purple, Pink, Green, Rose Gold
βšͺ Cool Tones: Silver, Ash Blonde, Titanium

🌟 Key Advantages

⚑ Fast Processing: Get results in just 10-30 seconds
🎯 High Accuracy: Natural-looking transformations with AI technology
πŸ’Ž Professional Quality: High-resolution output suitable for social media
πŸ”„ Unlimited Trials: Try as many combinations as you want
πŸ“± User-Friendly: Intuitive interface with visual color palette


πŸ’‘ Perfect For

πŸ’ˆ Salon Consultations: Show clients potential new looks before cutting
πŸ›οΈ Personal Styling: Experiment before making a big change
🎭 Entertainment: Fun transformations for social media content
🎬 Creative Projects: Character design and visualization
πŸ‘— Fashion Industry: Match hairstyles with outfits and makeup
πŸ“Έ Photography: Pre-visualization for photoshoots

LINK: ginipick/Change-Hair
Β·
VirtualOasis 
posted an update 1 day ago
view post
Post
1493
Agent Mesh
Agent Mesh is an exciting framework where autonomous AI agents collaborate in a connected ecosystem, sharing information and dynamically tackling complex tasks. Think of it as a network of smart agents collaborating seamlessly to get things done!

Agents share tasks and data, boosting efficiency.
Scalability: Easily add new agents to handle bigger challenges.

ginipick 
posted an update about 16 hours ago
view post
Post
964
🎨 FLUX VIDEO Generation - All-in-One AI Image/Video/Audio Generator

πŸš€ Introduction
FLUX VIDEO Generation is an all-in-one AI creative tool that generates images, videos, and audio from text prompts, powered by NVIDIA H100 GPU for lightning-fast processing!

ginigen/Flux-VIDEO

✨ Key Features
1️⃣ Text β†’ Image β†’ Video πŸ–ΌοΈβž‘οΈπŸŽ¬

Generate high-quality images from Korean/English prompts
Transform still images into natural motion videos
Multiple size presets (Instagram, YouTube, Facebook, etc.)
Demo: 1-4 seconds / Full version: up to 60 seconds

2️⃣ Image Aspect Ratio Change 🎭

Freely adjust image aspect ratios
Expand images with outpainting technology
5 alignment options (Center, Left, Right, Top, Bottom)
Real-time preview functionality

3️⃣ Video + Audio Generation 🎡

Add AI-generated audio to videos
Korean prompt support (auto-translation)
Context-aware sound generation
Powered by MMAudio technology

πŸ› οΈ Tech Stack

Image Generation: FLUX, Stable Diffusion XL
Video Generation: TeaCache optimization
Audio Generation: MMAudio (44kHz high-quality)
Outpainting: ControlNet Union
Infrastructure: NVIDIA H100 GPU for ultra-fast generation

πŸ’‘ How to Use

Select your desired tab
Enter your prompt (Korean/English supported!)
Adjust settings
Click generate button

🎯 Use Cases

πŸ“± Social media content creation
πŸŽ₯ YouTube Shorts/Reels
πŸ“Š Presentation materials
🎨 Creative artwork
🎡 Background sound generation
  • 1 reply
Β·
openfree 
posted an update 3 days ago
view post
Post
2314
πŸŽ™οΈ Voice Clone AI Podcast Generator: Create Emotionally Rich Podcasts with Your Own Voice!

πŸš€ Project Introduction
Hello! Today we're excited to introduce an AI-powered solo podcast generator that creates high-quality voice cloning with authentic emotional expression.
Transform any PDF document, web URL, or keyword into a professional podcast with just a few clicks! πŸ“šβž‘οΈπŸŽ§

VIDraft/Voice-Clone-Podcast

✨ Key Features
1. 🎯 Multiple Input Methods

URL: Simply paste any blog or article link
PDF: Upload research papers or documents directly
Keyword: Enter a topic and AI searches for the latest information to create content

2. 🎭 Emotionally Expressive Voice Cloning
Powered by Chatterbox TTS:

🎀 Voice Cloning: Learn and replicate your unique voice perfectly
πŸ“’ Natural intonation and emotional expression
🌊 Customizable emotion intensity with Exaggeration control
⚑ Seamless handling of long texts with automatic chunking

3. πŸ€– State-of-the-Art LLM Script Generation

Professional-grade English dialogue using Private-BitSix-Mistral
12 natural conversational exchanges
Real-time web search integration for up-to-date information
Fully editable generated scripts! ✏️

πŸ’‘ Use Cases
πŸ“– Educational Content

Transform complex research papers into easy-to-understand podcasts
Create English learning materials in your own voice

πŸ“° News & Information

Convert international articles into engaging audio content
Produce global trend analysis podcasts

🎨 Creative Content

Tell stories in English with your own voice
Build your global personal brand with custom audio content

πŸ› οΈ Tech Stack
🧠 LLM: Llama CPP + Private-BitSix-Mistral
πŸ—£οΈ TTS: Chatterbox (Voice Cloning & Emotional Expression)
πŸ” Search: Brave Search API
πŸ“„ Document Processing: LangChain + PyPDF
πŸ–₯️ Interface: Gradio
πŸŽ‰ What Makes Us Special

🎀 Voice Cloning: Perfect voice replication from just a short audio sample
😊 Emotion Contro πŸ“ Unlimited Length πŸ”„ Real-time Updates
  • 1 reply
Β·
dhruv3006 
posted an update 2 days ago
view post
Post
1802
C/ua Cloud Containers : Computer Use Agents in the Cloud

First cloud platform built for Computer-Use Agents. Open-source backbone. Linux/Windows/macOS desktops in your browser. Works with OpenAI, Anthropic, or any LLM. Pay only for compute time.

Our beta users have deployed 1000s of agents over the past month. Available now in 3 tiers: Small (1 vCPU/4GB), Medium (2 vCPU/8GB), Large (8 vCPU/32GB). Windows & macOS coming soon.

Github : https://github.com/trycua/cua ( We are open source !)

Cloud Platform : https://www.trycua.com/blog/introducing-cua-cloud-containers

merve 
posted an update 2 days ago
view post
Post
1755
HOT: MiMo-VL new 7B vision LMs by Xiaomi surpassing gpt-4o (Mar), competitive in GUI agentic + reasoning tasks ❀️‍πŸ”₯ XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212

not only that, but also MIT license & usable with transformers πŸ”₯
MonsterMMORPG 
posted an update 3 days ago
view post
Post
2641
VEO 3 FLOW Full Tutorial - How To Use VEO3 in FLOW Guide : https://youtu.be/AoEmQPU2gtg

Tutorial link : https://youtu.be/AoEmQPU2gtg

VEO 3 AI is rocking generative AI field right now. FLOW is the platform that lets you use VEO 3 with so many cool features. This is an official tutorial and guide made by Google team. I edited it slightly. I hope this be helpful.

FLOW : https://labs.google/flow/about

Veo 3 is Google DeepMind’s most advanced video generation model to date. It allows users to create high-quality, cinematic video clips from simple text prompts, making it one of the most powerful AI tools for video creation. What sets Veo 3 apart is its ability to generate videos with native audio. This means that along with stunning visuals, Veo 3 can produce synchronized dialogue, ambient sounds, and background musicβ€”all from a single prompt. For filmmakers, this is a significant leap forward, as it eliminates the need for separate audio generation or complex syncing processes. Veo 3 also excels in realism, accurately simulating real-world physics and ensuring precise lip-syncing for characters, making the generated content feel remarkably lifelike.

Introducing Flow: AI Filmmaking Made Seamless

While Veo 3 handles the heavy lifting of video and audio generation, Flow is the creative interface that brings it all together. Flow is Google’s new AI filmmaking tool, custom-designed to work with Veo 3, as well as Google’s other advanced models like Gemini (for natural language processing) and Imagen (for text-to-image generation). Flow is built to be intuitive, allowing filmmakers to describe their ideas in everyday language and see them transformed into cinematic scenes. It offers a suite of features that give creators unprecedented control over their projects, from camera movements to scene transitions, all while maintaining consistency across clips.
BFFree 
posted an update 2 days ago
view post
Post
1674
I am a shy artist. Primarily because I don't get motivation from sharing art publicly. I see so much new art daily online that once I begin thinking about where I fit in the mental fatigue becomes counter productive for me.

Recently I shared an album of hundreds of creations with a friend (and singular art fan) and he asked some questions that I felt were interesting enough to create this post on my process and what it teaches me vs what I am seeking.

Specifically I have learned to take ink drawings and create renderings that reveal my actual intention. My digital art goal is to recreate natural details into characters and landscapes that are imagined and deal with my affection for abstraction, deconstruction and humor.

My drawing goals are to be humorous and crafty about how things can be rendered just slightly incorrect to make the viewer see something familiar and recognizable even when its nonsense.

My process is using hysts/ControlNet-v1-1 with Lineart, 50 steps, 14 guidance scale and I give minimal descriptions that are often plain. Example "Really real old dog, plant, and another old dog, with an alligator turtle, posing for a photography portrait".

In the past few months I started taking the ControlNet render to multimodalart/flux-style-shaping and mashing up styles. Here I used a portrait of a Tortise and a dog laying next to each other on a reflective tile floor.

Last night, I took the Flux output and had it described using WillemVH/Image_To_Text_Description which was very accurate given the image.

I then fed the prompt back into Alpha-VLLM/Lumina-Image-2.0

The last step confirmed why I prefer using sketches to language. One, I am a visual artist therefore I have much better nuance with the drawings than with words. Two, my minds eye looks for the distorted. Three MOR FUN.



  • 2 replies
Β·